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RO CENTER FOR GEN ETC ENGINSESUNG 

GENOME SEQUENCLNG PROCEDURE BY HYBRIDIZATION 
\VTTH OLIGONUCLEOTIDE PROBES 

at Tedtmcai Fieid 

— « ih« f\*id at -atecuiir biolotfv. i a the wtemauonii patent etauiituuon. u seioags 

The present invention »* m the neiU ot ...otccuiar oiui«f . 

;n cias* 

/ji TtfCfwtica. Prooiem 

pn=ar* or th. se,u»c of ft. ««r. S «ob». oartKularty ft. »««a* ,«o«. . our chaise, « ft. 

«d of ft. 2Cth «n«ury. An ~. pmm **m* «« ki»W - ** *«— *• — I— * ««— 
for ebanc-ruuc «p— of ft. Uvin, world. Thi. would prortd. a » *. ta-lW- of *• 

ftacuoom, aod .vo»««a of orpa*-. h would al» «pr— . » «|or juap » *• «P^« «* of — ' 

diseases, ua food production sod ia biotecaaolofy tn ftatnl. 

c) Stat* of tht Art 

7T>e ucsaolo.y of «o«b«aw ONa au sao. « P«*bi. to repUca* wA *«« m f «»» of pmm, 
DNA , from 200 to 50.000 bp,, U *» mmm. « ~««c« aaotuu of «-ftl *-* for de-nmn,* ft. 

,., w»uch ft. o-eUoud- . ft. do** ta(«i « ^ M * W8 " " daU, ™ ei " 

polyacrviaoide tela eaoabi. of separata. ONA frtfantf of 1 to t maximum of 500 bp tad differiat by »b* ***** 
of oo. nuclei V?t~«^«.**~^*~<»r° By .p-fic cb^cai d.^ of ft. 
DNA «nnd at ,« ft. parueuU, « U»*. by iht Maxam-Gilben method (Mixam. A.M.. ami 

Gilbert. W.. Proc Ned. Acad. ScL 74. 3*0 « 1117)1. a* by «.m, er^mauc ONA oo ft. Co.- -w 

.oicb ftvoive, ft. addiuo. of a d«^y.uc<eo«d. cap*, of «oppm. ft. ay— « * *- « *" 
audeoud. « .oca»d « ft. eio«d by the metnod of S«, W iSaatof. P.. « a... Proc. N.U. Acad. Sc. 

U 5463 f lomi.* Both meftoda reouu. a cooaderabl. arooum of maauai work » that ft. ram of f . 

J0Od llb0flt0 „e, ft»«,b«« ft. world ,s aoouc 1 00 bp p« day p« oenoa. By «» of C«»«c. (co«p««« aod 
robou, s-ueoctn, cao b. ac«lerauO by . ,ew order, of ° f ^ ^ 



•rr.orae mi oesn uiKuaseu at manr ^tenuiic rssstinci m the united Sutes licence -22. (Researca New*.. 
I<3S-I35? ( I9S6M. The conclusion is mat seuuencisg can se icsaMiuaeu on.y in weil orjasijed craters 

■ .dues::?.-,- tar.ones.. iftat the east would b« .ooui i b.il.on uoiiars anu that mi task would take at least 10 years. 
;*?M«e umw *re currently mead o. ill other* ta organizing ecxaonents 01 Juea • center. Their lequcacieg 
.jsater hu » -easily ot about one million op per eav. me eo« beta* 0.1? Jollar per genomic bp |N»rure iZS. 
iCamaenurvi. "? 1-772 (19871|. Scout, the ruoom selection of cloned trsgmenu eontaiaiag about 300 bp 
requires sequene.n? three S eneme lenehts. the seuuenem? ot 10 b.il.oa bp .n sueii » center would take 30 years, 
nameiy to teaueace the hurr-aa S eaome aioae .n a few years, at least 10 sucn caters would be needed. 



J i Otscnption 



Our sequencing procedure na» an entirely U.ifcrtm logic ana is appuculc oniy to ute deurmtaaiion ot 
,eaueaces ot the enure yeaomeisi: u is uneconomical tor the determauuoa of specific short tragmeats. The 
proceuure .s oased on stnctly specific hybnd.xat.on ot ol.goaucleot.de probes .ONPs) that are 10 to *0 aueleot.de. 
loaf. 3ee.ua. hvbnd.zauon coaduions can be determined whea ONPs oybndu. ooly to sequence, wnb complete 
homology, the seouesce can be rod by such hybhdianon. By hybhdiziai the enow genomic DMA replicated a 
fragment, of appropriate leagth with a sufficient number of ONPs aad by eomautenaed arrangement of the detected 
soquences. the enure genome can be sequenced at same un* W. bdi™ that (his procedure « Kverei times 
faster aad less e.penmv. thaa the procedure now beta, developed aad thu for this reason it eouid be sppiicabl. to 
the sequencing of geaoows of ail characteristic species. 

For this procedure, it is necessary to optitatze the length, sequence aad number of the ONPs. the tea** of 
the geaotatc DN A fragment, that represent a hybnd.zauon point, aad the method of separate repitcauoaof each such 

r'rigtneat. 

The number of possible arrangements of the four nucleotides as a fuacuoa of length ts equal to *— «d for 
Men lengths is shown nwhe following ubie 

Length (bp) • "» '« 12 13 

Number 2621** »048S7« 4194J04 16777216 67108864 

Oo the basis of the forefoiaf. to detect every possible sequence, aameiy to accomplish the dispUeemeat of 
only on. bo in the ONP arr«*«m«m. ,t ts accessary to us. .bout 260.000 9-*« to 67.10* U-men. Been*. 
specne hvbndiiauon is likely to be achieved only w,«h aa ONP w,th 10 nucleotides or mot. fWaJlis* 
Nucleic Add. Rescarcho. 3500557(197911. the number of required ONPs wiil be smallest if a lO-ewor !!•«« 



,> 3c-.au- o, ,h. „uu«e cmo,.^ ONA ^o, on, ON? a.ecu Ul(tefem , <au<fl . M 

wnen r„ u ,„ , „„*.„««. For «amo... .n« i CaCAj' «Uo ,n« „u««e. JTC7C3\ .,,«.,» « 

5'CAGAJ' »nd j.-j.^^.. 

rC7C7S * JAGACJ* 

?or «h„ reaaon. 0 „iv , ne ni ,f u ^ 0 .v Pt . nin-y , of ^ , ^ , , Qspj 

.re necied. ?lliflaremic 0Np , M ^ ^ „ ^ ^ ^ ^ ^ ^ ^ 

are oanndronuc. -averse, v. *„ mean, ,„„ ,, e !rsauenev „ nOBptilBoromie prooef „ , ^ ^ ^ ^ 

For uneou«voeal seouence linnaaiuaa. u u not necsssary to ui.i.za .il ONPs at a j.v*. Ien , m . -, e UM 
-t • smaller fenemis tncmeat as a nvondiiauen oomi makes it possible 10 use t.wr prooes. In «h„ mm tt8 
prooe overiap mil be i M but «,U su.rtce* so mat «, ,he shOR ,eao«,e ONA iraeoum ol ov * n ^ 

lenjths will not oe repeated many times. 

From the .vers,, disuaee (S ) bar***. „«, eotapietneaury » one ONP. which depends on the ONP !«,* 
sad the ratio of its diaudeoude eemposmoa to the diaudeatide cotsposmoa of the leaomrc ONA betaf seausacad. 
it is pesstbl. to detanaine the freaueaay of (he f ,v„ ^ , lea|tB of 0NA w# ^ 

equanoa, derived on the ba» of ,aa them? of probsbility (Onaaac R.. et el.. Nuatea A «ds Reseat 14 
«Wt-«92 (1916) and our Psuat Application No. 3742 of Much 24. 1917]. Tail. I show, the sy*,, disaae. 
betwe *° iMtteaees of eenata bomoiorous ONPs u ■.■—<;.„ (eaaBa . 
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T*bU i 

Avenc« Otsunu iSl 3ecween ieuuenca* or 
Homoio$ou« ONPi in M«bmiuui Genomes 
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of l-|ft 0 w,** ft. «oru«d by *. ONP U at Urn oce* 



The result! are p roes tod ia Tablo i- 

Tabki 

Perewif* of 5000 to 20.000 bp-tonf Genome 
DMA Ftir— Coniaaai Se««ne- of CoajlW ONP 
wnhS Equal w 23.000 io 200.000 bp 




3v us.n? ust smomiai u.smbuuon. *« uetersuaeu tne orooaniny isat u-.e sequence inai „ ca.-a.e-ea^ l0 
? .ven ONP be repeal * „«am numoer o, c.-ne, m , ON A ,r, ? o«« or de/iaeu leap.. pree , oliltv 
J e3 en«, on ,he iv„ ?e a. suae. . Si «e„ m ,„c me ?l v.„ S eou.ne. ,„ «e ;, 30 m«. Tn» proe.o.htv ittl ot MleuUt<d 

by tne roilowmi: equation: 



(2) 



where:n 0 „ the length or the DNA fr^mem ,n oo «d N „ che numocr or re~t. U o«,or the c.v„ «cuence w, irun 
the length 0 whose prooab.iuv pr N) ,s bem g sougnt. CrO.N) „ the number or combuuuon, w, (ft0 ut repem.oo or 
N of 0 dements. H*au*e w„h,n lengin 0 :here ire aoprox.maieiy D sequences wmch on average have the 
>arae S. by muitioiy.ns the aoove probability by 0 oouin in. numotr or different sequences or * defined S th« 
.re reoeated N erne, w.thin length 0. ~ e ca.cuuted numoti or Afferent sequences w,c„ m « g.ven range or S. 
D anu N are presented in Table 3. 



Table 3 

Number or Oligonucleotide Sequences wub S in 
the Rmft of 25.000 to 200.000 bp tad Repealed 
I tad 3 Times in 5000 bp end 10.000 bp Fri f ac au 
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From Tiblt 3 ,t is possible to esuoat* the muuaum required nursoer of ONPt of a oeiued Ua,* «d C +G 
coapotmoa accessary for successful readiat of seance. « *. 5 kbp or 10 kbp loo f ,«,««. ^, freaky 
of two bad* of sequeaea* is essenit, for sac* r*adia f . warty (he frequency of sequences of compicmnuuy ONPt 
«d «iui of th. sequeaee. of ONP. wU, n^.oua 0NP ovwU8 . ^, Wli , MpUiaed ^ ^ eitapla 

of a nucleotide w«a 1 1 ONPt. 

For about ixWONPs of 1 1 bp leaf*, .very 1 1«, „ dtttetea - u 0NA [a^^ overUp 

•i aiway, „ , B^aua ««««« to 10 bp. a eaa bt seee from Tablo I i&atth* av. n| « duuaee berweaa the 
most frequeat il -aen .s 2S2.000. If aU 4919 I !■««,,„ ih. J 000 bp fra f sm were equally freoueot. only on. 
of tbern would be likely to be repeated win iT.ble 21. Because there are probably no sequences of 5000 bp leaps 



witn 90^ ur rr.ore of A - 7. this means uui * more sicsiricant resetmon or 1 1 -mer* in the iOOO bp rrarmtnt and 
prooaoiy *:»o m me 10.000 bp fragment ua occur onty tor nonranuom reasons. » v »th recara to tne overtap 
, caucr.ee i lO-mem. the hicnest treouency occurs for an avenge of S5.000 (Table 21. and 2-fold repetition wutua 
5000 bo wouid occur ror a maximum or five suca sequences, and prooaoiy for an average of about two influences. 

Tne eufi and accuracy, namely the nonamot gutty, of the reading or sequences depeaos on the nusioer of 
repetitions of overlapping sequences. If w« imagine reading u a cwo^imertstonal progression of one or more suns 
t andomiv $<iected ONPs from among ail ONPs capable of hybndiiauon to the given genomic DN A fragmesti. then* 
ror eacn starting 1 1 -mer we look lor tne left and the ngm base pair by searching among the hybridized ONPs for 
:he 10-mer mat is to the left and the right ot the starting 1 1 -mer. vv*en after a certain number of reading sieos the 
lO-mer is idunu wnich because ot being repeated in tne given sequeacc is present in more than one U-mer. the 
reading in tms sense must oe interrupted here, because w« do not know wntch of tht detected base pairs are in the 
continuation of the sequence and which are at some other location. 3y reading m the other sense, this interruption 
will be overcame. Considerable repetition of overlapping sequences, however, will male* reading more difficult, 
and it ray even became impossible to ov e r come the interruption. 

Oa the basis of the nl-i'v-t rrpriiihiliry a is possible to estimate the lowest number of i 1 -nucleotide ONPs 
required to prevent the uumspuen of sequence reading or the ambiguoua linking of the read fragments. By 
reducing the ausber of ONPi. (he overlapping sequeace is shoneaed and its repeatability is thus mcresaed. 3y 
syathesissg a larger ausber of more freoucat 1 1 -men (containing more A and T) and a lesser number of those 
with more C and C. it is possible to achieve the same optimal repeatability of overlapping seoueaees although of 
different leagtti. Assuming that the mas: mum repeatability of overlapping sequences resuitiag ta successful readme 
i$ about 20 sequences rcpriffri twice, for the sequencing of 5000 bp fragments* the avenge distance between 
overlapping sequences oust not be lesa thaa 50.000 bp. This means that the following needs to be synthesized: ail 
ONPs wtta one or without any C or Q (this gives an overlap length of 10 bp); every other 1 I -mar with C * G from 
2 to 4 (this gives an overlap length of 9 bp); every third U-mer with C * O from 5 to 7 (this gives an overlap 
length of 8 bp), and everfTourm 1 1 mer with C + G greater thaa 7 (this gives aa overlap length of 7 bp). The 
total number of ONPs uto selected would be about Is 10*. In our opinion, computer simulation would show thai 
even one half of this aumber of 1 1 -nucleotide ONPs would be sufficient. The sequencing of 10.000-bp tragmeats 
would require about 10* 1 1 •men. If I2*man were used, this number would be at least three times higher. 

For easier reading, synthetic ONPs can be arranged by starting from one or several ONPs and proceeding 
over the overlapped parts. The ONPs thus arranged would be merited by letten m alphabetical order and according 
to increasing numbers. Such marking would make it possible to arrange tht ONPs that hybridize to a given geaomie 
0 N A fragment into one or several arrays which would then be convened to the ONA sequence only by deopheriaf . 



i 

The question of eenomie OS A ,r. :ra;sits:s or ceftaeu ussta c« oe 4Ccamoiuheti m ways: n ;v 
down*, ma i) by araoufteaiion. 

It eaa ee seen from tne foregoing anamis mat tne mumum lenetn iWusMreu wuh * reasonaoie nusoer 
ONPs .s .bout 10.000 bp and that sOOO-SOOO bn „ , oetter length. ?!asmig vectors are mo« *dv„u S eo J ,or 
cloning these lengths. To ereate a eomp.ete genomic library, these vector*, because of their low,, transforr.at.on 
efficacy eomoared to phage vectors. r «u.re 20 to 100 u S of genomic ONA. wb.eh is not a major rwuiremtn, fc, 
the onetime creation of the library, for a better ^presentation of genomic ONA. it would be necessary to generate 
5000 bn lone fragments by partial digestion wun two to three common enzymes t&tu JA. Duel. A lu n. Tj reauca 
the effect ot any -loste' and repetitive sequences un this respect, plasnud vectors have an advantage over poise 
or eosmsd vestorsi. .t is necessarv to form a i.brarv m two vectors, in our opinion, plasmas ot senes pUC ua 
PAT are most advantageous for this purpose eecause inev muiuoiy weil and are relatively small. 

The sequencing of eloned fragments by nybndiauon can be accomplished in two ways: by eoiosy 
hybridization sad by dot blot hybridization of .joined plasrmd ONA. la both eases. 2000 in J000 different ONPt 
represented ia the vector sequence cannot be utilized, i.e.. they will cot even be synthestszed. 

Coloay bybridizauoa is probably faster and less expensive than dot blot aybrtduuioa. but ti requires spectrle 
sortitions to etitaiast* the effect of hybndiatton w,th bacsenai DNA. To reduce general baek|wuad eoise. the 
libeiiat of probes should cower bif a sensitivity .a bybhdizaooa. becauia a this tssaaer very small coloa.es could 
be used. ONP labeiiaf should ia say case be by biouaylizatioa because of easy tad lasting labdiat ia the last 
synthesis step. The seasttivtty achieved ia thu case (Al-Hakia. A.H.. and Hull. R.. Nucleic Acid Research 14. 
9905.5976 (193411 cases u possible to utilize at least 10 tiraes fewer colonies thaa are required by the suadard 
oethod. 

To avoid false positive hybridizations caused by homology of the ONP with the bacterial sequence sad to 
utilize short probes sues as the It -men. which oa average .re repeated rw.ee ia the baoenal chnroosome. it is 
necessary to use vectors giving a rranrmim aumber of copies per cell. It is known that by additional amntificauea 
oa chlorstapaeueol. pBR 322 caa produce 300 to*00 copies per bacterial cell (Lia Qua. S.. sad Bremer. L. Mot. 
Geo. Genet. 203. 150-153 (19861). The replication efficacy of the plasmas pAT sad pUC is at lean twice is high 
l Tw, lf. A.J. et al.. Nature 223. 216-218 (198011. so that we caa assume that under optimum conditions even iOO 
pissmid eop.es caa be produced per ceil. Because of the load represented by the sequence intraduead. the chiasm 
plisaids w.ll certainly not multiply as well, particularly in the presence of more toxic seoucaccx. For thu ream 
" »* nx ***y 10 »bout 200 copies of chimene plasmid per cell. This mesas that, oa average, with etas 



...•«<•* vvi«r* located on me ouahuo. Tr.is 
s.,n»» wo-U be 100 urn. ,iron«« « «!» como.e^.irv seuuen.- *«« 

r . 3rt scn« , - - * ~ 1 °* 3SA - R 

, 3 ,ob«. hyb«du»uo« w.ih the bac-nai ON A no, «,,.««. 

• w ns*P« wiil be repealed in ihe bictcni. 

3» usmc <" binomial J.»tnou«.oa. »• dewmttM hew nuny ONP> *~ 
■ * u. ^ e,,-H ANPs wouid cive unreliable intormauon 

,-^ m cso m . more man i0 nme. a, . r-U of ™iom distnout.on. Such ONPs 

or. .f ^ S - -p*— - — *«"■"• - — ■ theT e ° Ul4 B ° 

. nvnmi tav use o. =5 2. wii«m 0 .s the l«nS«h of the bacterial chromosome. ..... 

Tibte 3 .how, ,he rendu ooumeu o* use Nucleotide. 

. „ . "Hue eaicui&uon usumes in« •» 

.X 10* bp. Md S i» u» number ot different ONPt. . us caicmau 

«, the DNA ot £. <.«(.. which u aime* entirely the eaas. 
ne unitorauy represented in the u.™ °« 




,. ^ th^i mt I l-«or w,U * repeated more than 13 M 

U «. e, — f«o« Tabl.4 ,h«. e«aoi b. m ^^^^ ^ 

DNA. ^e oa«reliy aeunma-d autaoer of I MM w.U funca 

beC41 ^ recotnbuuua, bsct«u do not tolere- reckon, w c» «p« 

o«suchl l .m«w,Ub.,-U. T^^rwo^aocbeuuiuMforbrbn^o,. 

^e.o,b^^ 
bacterial DNA. + 

sequence, of SO bp or lonftf between baetenai and .usance ONA **** 

. h« ciamiuttaottSiv in ih« colooy 
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« nvp OS Pi e-*oie the comomauon to ur.u«r?o positive 
Oft m- owtr Mini, W uewtnaot *mca ONP cr uNPi c..*oit 
" u — . tftn increase- the reamreu amount ot ciia 

-i -r-oe must oe orient m wvtrw comotfuiions. *.itt iftis tncrw*- 
,vonc.iauon. «a ,r.oe mu* concentration ot 

OSP -owever. because tor succeasrui ana laai hyondiauon it » auu 

J v-rv low to mat the concentration is cnW v «i"T 

ins oes .. the hv.nc.auon nuu.c. and because oroo. cansumouon » very 

p TO o« kvnnaii^d in a tew portions oe smaller votuos 

. u , otl v reauced after nvondiauon. * larger « s » hV 0n0,Md 

0 , «. same hvbnc^.on bamd. reuuires • «U» «■« !»"" « > re, ~- 

. „— ONP in tnreeeonmnauons so that nooe of the outer 
3y uswe 30 ONPs per nybnd.iation anu by repeating oss ONP m wree 

ayusmgjuu P 0 v b nd.siuoni « reduced untold at me -.est 

3 Q orooes .* pram tn two ot the three eomowauon. uii nu...c.r ot n>briu . , 

Z -1 U»- — . o. - ONP reared. Based on the prooeotl.ty that the — n o, a dettned 
„, t three t.mes .arger »mou« w< determined the oercesute ot 

-umoer ot ONPs hvond.ia «o «ne .ragmen, ot jenonue ON* ot d«fmen lenc 

. „» 0 .*m for 30 ONPf »■««> I » nucleotide* if iboui 1 30.000 bp. 
The iverwe d.stance berween homologous seauences tor ju u. " 

The avenge u.iubw reading, proportionally more ONPs w»« 

u ^;«««£» <S1 would be about 100.000 bp. 3y umot tm«pii«»r^i 

0 a ,6»^«he.v W dua a e.(S)wo«idD« qsPs will hybndae to » genomic DNA fragment of 

U w. determined the probability Out . combuuuon of 30 ONPs w,ll hybnou. * 

U.weaeien™ ^„ «™tabtlitv text three different combtnuians will nrbndtfs 

l. r\ s.-vw hs Thii orobebtUtv ts 0.0415. Tbe prowotiity io» «. 
length D ■ sOOQ bp. usproow h«bhdixed. is ebout 250 eolomee 

r . t „ . 25 Xl o-. Sine* 2 aOiioneoJonie. (fngmeaal tr. betog hybnouea. m eeou. 

io iae »ame fnjaeni u l.-Jxiu ^ ^ For these colonies 

nave . common ONP wll hybndun to it Una on. of their probe*. For these coiome- 
all three comoiniuoni Out have tcommoow r nNp a «aiiM for rflimmeuea 

, k m _ u-y- , seauence conmlemenufv to the eoomoa ON?. 3«auie tor mimnau- 
w.w,llnotl«owwheuier«h^n*v..«u«nc« «P ( — ONP seauence that u common to the three 
.enome. the number of clone, thatconum at lea* one «n»iemenary ONP ****** 

genome, the nunwer ^BuiuMously hybndun *«th the common 

. im ia io 000 th* number ot colonic* that wtii uso nmui™--— i 
contomauonsts 300 » 30.000. thnnumner n^^oNhton^i-M—rf*"^ 
it .h. warn cam be lea thin four. For one million different onn «« »™» 

whether it hybndua or noe. ' ™ 

obtamed by hybnduaoon w,dt each ONP sttanim*. 



cooibtiuuott conttutt « lean otw OWF tna ny*™— cflff «dfin 
toaw - eambmauoos ifivoivod when one comww» 
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.=mo,r.,t, 0( u are lar,.. t.,e B rooao,h,v „ ni|hir , Ml ^ „ lfferefll pr00e , „„„„ hv „ nduaH0B ^ m ^ 

• =n "" M " M - W ° Uid ™ «« ONP prooao.y does no. hybnuix. „„. beaee . tt< W0U1<1 0<Jt ^ 

» re.ec: :,, ,„, tI ,„v consider ONP „ We on . doeJ ft0 , Byonaue „ tn , f , v „ ^ 

■ ?!.) can o* emulated .aproatmaun* by « „ the ?fI . (((D) ,: x KJ) whefMa K „ ^ ^ ^ 

■» - c 3 ™„„ 10 „ ««i Pro, „ .«. Pf00ieillty . |-MJ one 8 , K 0Np , hybndijej w on- ffi?aeaj ^ 
ONAh.v.njien^o^.,, The tormuia u valid for (P(D)'] * K < I. Wfeen fragments of leaeta 0 "« 5000 
bp.re Jeuue „ce < l. [ h e aw„l,eomb IM ., 0 „ f h,v,„ sK . 30 ONP. 0. 1 % of the information is Ion: with K - a 0 0N p 
O.i * „ l0 „: w„„ K - .5 ONP. , .«« „ losl: W(th K . 50 QNp . y 2% ltloavui wua K ■ 60 ONP 16% of 
-he .morion „ ,ost. It can « concede ,h« a ,0-lS-fo.d reduction ,« the reoutred number of hvbnd.unon, 
be aeh.eved wuh a *mail loss o. mformation. Thi, urn. the number of reou.red filttn. namely numbef 8f 
reoiicaiionj to oe maoe ot 2 Bullion clones, would also be smaller. 

Tae total numb«r or hybndixauon pomu can b. reduced by using a few hybridixauen steps w,«n lart. 
:omo,n,t,ons. "nux. „ «. npmu ot : . 30<XJ n¥bnd|auoM ^ „ ^^^^ Qf ..^^ 

points, cacn pomt couid be ,«rehed w„ h j_t ,. mes feww hvbndiaaon*. namly ,h. wafer, „ the filter could 
thus be reduced this oiay tire*. 

3y bybridixauo. of the tsolaiad plaxnud DNA. th. arbitration procedure would be ftciuW but „ ^ 
*«^«o.solau,asufficieat« Ta.nuBb.rof dmrnwidtSOOO^p 

frafmeat. tor Wold cov^, of mammaiun renom* „ 2xl0«. Ti. re*** amount of DNA from each don. 
(Mp) is given by the product 



Mp - iOp/0^ x Bb z i l/Brt a Md 

where Op « d» six. of th. ««««« pias»d u bp. 0^ u ito ONP leapt. Bh « a» 0tt «wr of roared 
hybhdiauons. Br i, *. ouml» of rdr^mdiaMnaof th. same filur and Md is the .«*« of DNA (hat can be 
d««cd by (t ,e bybndixaaon procsdur^ By taluni the most probable value*, oaanly Dp • 8000. 0^ ■ U.Bh 
- 2x10*. 3r - 10 and Md - 0.1 Pf . w. find «Ju« a ae«»ry -o ,v^au about 0.2 M f of DNA for each chinwte 
Plunud. Succaattui ftbybndixation of Mum uut have bee. hybndixed wib . b.ouayltxad probe das oot be« 
developed to dauu On th« other hand, mere ire mdicauon, that id. biounylixnd probe, .t u poasibi. to dew a. 
I.««1. M 0.001 p g . Hence, from ead, of th. 2xl(rc«on« „ ,, „e«a«ary » isoUu about 0.1 .o l Mtofplunnd 
DNA. 

The ampiifieauo. of the enure fwonuc DNA can be aceampiished m about on. nuilion pomont o/auxaup 
to 10.000 bp whereby the tenon* would be covered mor. «ua thro. now. Tab « accomplished by mew of an 



iSprooruieW cno$«n mxxtureot oligonucleotide* 4* cn.T.ers tour Patent Aoptieauon No. 5**1 or Mirc« li. 1 9571. 
Waft iooui 50.000 (Jittcreni otigonucieottue* r.avtny mc complementary 5«uuence reseated S00 time* in me 
noarepemive par. or the rnamrnalian genome ttor examale. a 12-mer wim C - C from I 10 it. u is possiole to 
^zrry out one rruihon impuficauon reactions wun comoinations containing 50 pnmers so mat eacn pr.scr enters 
univ once into me same combination wun every omer primer. w ith sucn pnmer combinations, mere will be aa 
jverace of 60 sues in me genome where two primers will be oriented so thai their 3* ends will face each other and 
will be separated by a distance of less than J 00 bp. The fragments b etwe en these pnmers will be amplified. 
3ecauM thetr average length is 150 bp. me total length of the amplified genome ts about 9000 bp. One million or 
sucn amplification reactions rentacex tne piurruu ana phage library or the mammalian genome. In the amen ftcation 
it is not possible to utilize pnmen thai enter into highly repetitive sequences (those thai are repeated more than 
2-3000 timesi: nsnee only tne amplification or *suu*ncmg or the nonrepeuuve pan or the genome takes piece, tn 
addition, with 50.000 onmers with a ireouency or 800 tn the nonrepetiuve pan of (he fenome. about 10* or this 
pan or the genome would not enter into tne amottficauon units. With 100.000 pnmen. only 0 \% oi the 
nonrepetttive pan of the fenome wnuid remain unamoiified. With 100.000 primers, it is necessary to carry out 4 
million amplification reactions. 

By dot blot hybridization of amplifying reactions with olifonudectidea that served as primers end with newly 
synthesized ONPs up to the resound number of about one million, only ton se qu en ces of the amplified fragments 
would be read, because with a 2xl0*-foid aositficauoncach OH? having s complementary saqucace in the amplified 
fragment would have a 3-1000 times larger number of targets thaa if it hybndizad only to the homologous sequences 
m the urampiified nan of the genome. Only a three umes strmftr signal is espeetad for 1 l-nudeottde ONPs that 
do not contain C or G. and a 1000 times stonier signal for 1 2 -men without A or T. It can be seen from this 
analysis that by sequencing regions nch tn A and T it is possible to utilize ONPs longer than 1 1 bp (the 12-mer 
would give a signal 10 times stronger thaa the background noise). In this case, it is impotable to utilize ONP 
comotnauora for hybridization, because the signal would be equal to the background noise, and no possibility exists 
for selective prehybrtdianon 

Ton advantage of an^ficaiioo over eloaaaf ts that no living material is used. This procedure is much mom 
expensive, however? because each pnmer is consumed in 10 times larger quantity thaa if it were used only as a 
probe. Moreover, about 10' to 10* enzyme units of the the Kicoow fragment of polymerase 1 are required. 

Because each genomic ONA fragment hybn titles with all probes, it is necessary, if there is no rehybhdizauoa 
and if probe combinations are not used for bybndixauoa. to apply each colony or each isolated ONA or 
amplification reaction to about one million filters tor about one million probes. Tnis would be done by simultaneous 
automatic application of s large number of samples (about 1001. With DNA. this is much easier thaa by maJciflf 
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colony reoiica*. Most UUiy. i-- cuignie* *o U id b< automatically sccucd into loom one fruition reoucas or eacn 
uf me approximately 2 rniiiion vigne* oy taking a minimum mount of bacuru irom clones grown on microtitrsuon 
plates. To avoid removing coiontes from ?ein dunes, me transronnauon mixture can be diluted bv seeamg tne 
ipecined volume t.-.:o a hole 01 tne micropiate so mat one or no transformed ceil is seeded. To eliminate empty 
holes, a tracaoianution would then oe oertormeu irom noies with viable growth to a new pUu. The most difficult 
.ondmon is the need to achieve aoproximatdv the same growth of ail colonies on ail filter. 

If there i> no rehybndization and if prooe combinations are not used, the total number of hybridization points 
cuuais the prouuet cf the njmocr ur genomic ON A iragments (colonies, *ionea. amplification reaeuonsi by the 
number oe ONPs. for mammalian genomes this amounts to about 1 0' 1 points. If each pout requires about J mar. 
<bout JtlO^nvof filters will be necued. With 10 rehybnduauons and a 20-foid reduction in the number or 
hybndixations oer fragment, about 15.000 nvgt filters is required. 

Hybridization with all ONPs 01 tne same length wouid be earned out ac the same temperature under condiucms 
that eliminate the effect of the C - G composition (Wooo. W.. ct al.. Proc Natl. Acad. Sci. IA 1515 (1985)1. 
For 1 1 -mors, the hybridization and the waahtnf would be earned out at 20 *C. For bioanytized probes, which 
require about 1 ag of probe per car of fitter* aa amount of one to three optical units of each ON? (SO ugj would 
be sufficient for the s e qu e nci ng of a mammalian genome provided the hybridixaaon liquid is used only ones. By 
jiBuiuaoous systfaesss of 10 optical units and possibly by simultaneous hybridization, the sequencing of individual 
genomes could be simplified* irrnienisd and made leas espeastve. 



The cost of seq u e nci ng per genome the size of a mammalian genome would not exceed 100 million dollars. 
This is 5 times leas expensive than the costs estimated within the framework of the Japanese project. We also 
believe that the total ume needed for the sequencing of a genomo including ONP synthesis is shorter and amounts 
to about 1-2 yean. 



Because as many gSaotmc fragments are takes far sequencing as are accessary for each fragment to overlap 
the neighboring ooe at least slightly, from the sequenced fragments one obtains by overlapping over homologous 
sequences at the ends of the fragments an arranged library of fragments (clones) and the «^*rf of each 
chromosome. This is not so in the sequencing of amplified fragments, because it is possible to amplify and to 
sequence only fragments that do not contain, or do not belong to. repetitive sequences, tn this case, by arranging 
the sequenced fragments, one would obtain only regions between repetitive neighboring •~ ym trt 



!t appears that the optimum oruceuure :or « U u«ie.nif W*«omes oy tr.« metnod or hybnu.auc *.«n ONPi 
.» o» colony nyhnduauon or clone, larjer man jOOO bp w„n about JOO.000 to jOO.000 ONP, WHn , lc|lrn Qf 
10-1 1 nucleoudes , n about iO.OOO separata hybnuiutions w.,h combinations of about JO-SO ONP, » u.stnbutea 
that eacn ONP it reseated m mree comotnauont »ner«m tne other 90 ONPs art represented only ln one of the three 
i.ven combinations. The possibility or detection or a v.ry ,mau .mount or ONA by use or biof.nyiiiedprooes. bv 
.nereasinc the number of rehybndtzatton or a filter ana by reducuiff the number of hybridisations per tajment <n 
elimination hybridizations wuh eombtnations or about 100-500 ONP». howler, makes ir possible to reduee the 
required amount of ONA to leu than I » 8 . This quantity of plasmd* can b« isolated from bacterial cultures crown 
m one hoi* or a microtitration put.. w« ean si* v.fualiae simple, crude uolauon or plasmid ONa. -n.eh could 
even be easier than erow,n 5 colonics in thousands or replications. The entire isolation procedure would be earned 
•iui on micromntion piates. CcAinrueauon m tne microtitration plates would remove the medium, and alkaline lysis 
•wd denaturstion or the protein w,th «,uic souium acetate would five tne ceil memenne cnromosortul precipitate, 
which would be removed by ccntnrucation. The suseraatau would be denatured wuh sodium bydrotide and 
transferred to the filter in the form or a surfieieni number of dots. It would be easy to introduce the steps of 
alcoholic precipitation and treatment or the preparations w.th the RNA-se enzyme if it were necessary to reduce thai 
background noise from the hybridisation wuh bsoenai RN A. This method of isolation of plasmid ONA makes dot 
blot hybridisation more advantageous than colony hybridization. 



PATENT CLAIMS 



I ■ The procedure ot yenom. s«uuencia» ft* t«*onaizauon wun oi.,roauel«otide prooe,. jsaracunaed ui 
:.-.« jenonuc ONA tragams cantauun, 100 to 20.000 bp. oomnm .. suifici.at .mount by elowa? veew " 
:hat rapncau ,a £. of by .moi.ncauono, genomtc ONA »„h nuxturea o« oligoatKieot.de p nmerl . or „ v 
procedure or colony hySndizauon w.,hout or w,ta « leC uv. prehybndi tattoo or noabiotmyitred bvMntl DNA 0f 
by do. bloc hyondtsauon of isolated ch.mene ONA or by doc blot hybndixatton o. ONA from 

ampuficauon reaction,, under hybnd.xauon condition, perwtting only <he hybnditattoa of sequence, w „h complete 
*omo,ocy. .re nybndttea to .00.000 «o i.000.000 btotutyuttdoligonucieottdeprotm oi different lea?ta 
rangtnc inn l0 to 13 nud.ot.de,. „en preo . beta, hybndtted separately or tn coatbtnaaon, 0 f 10 to J00 probe, 
.ad that ta* ,rouo, or oi.gonuc.eot.d. -uuence, out m loom m ,nd.v,d»*l genonue ONA .r.gment, .re 
Jc,erm,neu hv aeue..o« o. the bourn b.oun. tn. .rraagenw.. of uhI fragment, over overlap^ sequence, then 
;ivia j tne oruer ot the nucleotides ia said fragments. 

~ Th " procMure ******* » ana j. uanetenxed m that oae deduce, from the combinations of 
ougmudeoud. probe. *at hybndu. to oo. fn|M th. olt.oc^ctaoud. prbb. that produce ayendtxatton by 
eltauutioaof those prob. whoa, other cotnauaaoa. a which they am proem do sot hybndia to the V y m 
fragment or that ia all eoatauuem. eoaouaia, ^P^^^^hm^ui^^^^^^ 
present at least oat additional probe ail eoamaaueai of which that contain it hybridizing to the gives fragment. 

3. The procedure according to Claim, 1 tad 2. cianctenird ta that the oiigomicleoude probea that 
hybndize to the gtveti geootac ONA fragment arr»g. themselves tato on. or sevmi array, u the aiphabaueai 
order of the Uttered pan and according to tncmng v.lue of the aumoered part of thetr martan, wbiea they 
icautred on (he but, of ponbi. overlap w,* the uulixed probto. aad that the amy. of marking, tr. dmphtt* 
by meau of a reverse algorithm to give the order of "«?lwmdat, 

a. The proceSini according to Claims i through 3. characterized ia that by detecting identical, 
overiapptag. annual jequences between sequenced fragments, tequeaced cIom or amplified frigmenta are 
arranged in an array, and the ovetill seq^nee of each ehromosom. of the given genome „ detarwaed. 

Applicant 
(Signed; | 

Prof. Or. VladioarCliaia 
Director 
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ABSTRACT 



3NA figment is determined 3, „„,„,,, „ _ , , ' of each mu.v.auai 

; ^-« B e_ t ooven,p 0nsoiB _. ltnttl ^ , _ ^ 

- l« S ..^v, fmt ,„„ >Uflclr(J luwmated procEuun3 _ 



INDUSTRIAL USE OF THE DEVELOPED GENOME SEQUENCING PROCEDURE 
frt TO uofttle«50eluiiei«ttoe J p aejM . "W* loom* 



In . , U eccnco.c i-^o.. JUS8 . plttU would „„„ • ivfv sunbtr of scientists to study chst 



Appitcaai 
[ Signed: I 

Prof. Or. Vladtatr Gluts 
Diraesor 



